Session 1 : Introduction & workflow overview

Advanced R-course 2025

Dr. Debasish Mukherjee, Dr. Ulrike Goebel, Dr. Ali Abdallah

Bioinformatics Core Facility CECAD

2025-11-21

Slides & Code

  • [f] Full screen
  • [o] Slide Overview
  • [c] Notes
  • [h] help

git repo

R-Advanced


Clone repo

git clone https://github.com/CECADBioinformaticsCoreFacility/Advanced_R_course_2025.git


Slides Directly

https://cecadbioinformaticscorefacility.github.io/Advanced_R_course_2025/

Session 1 :: Introduction

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style A fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

What scientific questions or applications am I interested in?

What experimental design will best address my questions?

  • Library Type
    • target RNA (e.g., whole transcriptome, mRNA, small RNA)
    • preparation method (e.g., stranded, non-stranded, poly(A)-selected)
    • sequencing type (e.g., single-end, paired-end)
  • Biological Conditions
    • Control vs Treatment
    • Time Points
  • Replicates
    • Biological Replicates
    • Technical Replicates

Important

Always consult with a bioinformatician during the experimental design phase!

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style B fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Randomization in sample selection
  • RNA extraction protocols
  • RNA Fragmentation
  • RNA integrity assessment
  • Avoiding contamination
  • Sample storage and handling

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style C fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Multiplexing
  • Spike-ins
  • Read Length
  • Read Depth
  • QC
    • RIN Score
    • RNA Concentration
    • 28S/18S Ratio
    • fragment size distribution
  • Tools: Bioanalyzer, TapeStation, Qubit

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style D fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Sequencing Platform
    • Illumina
    • PacBio
    • Oxford Nanopore

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style E fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Base Sequence Quality
  • Per Sequence Quality Scores
  • Per Base Sequence Content
  • Per Base GC Content
  • Per Sequence GC Content
  • Per Base N Content
  • Sequence Length Distribution
  • Sequence Duplication Levels
  • Overrepresented Sequences
  • K-mer Content
  • Adapter Content
  • Tools: FastQC, MultiQC

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style F fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Read Alignment
  • Reference Genome
  • QC
    • Alignment Rate
    • Coverage Uniformity
    • Duplicate Reads
    • Insert Size Distribution
    • Contamination Check
  • Tools: HISAT2, STAR, Salmon, Kallisto, RSeQC, MultiQC

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style G fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Annotation

  • Quantification

  • Tools: Salmon, Kallisto, featureCounts, RSeQC, MultiQC

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style H fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Normalization
  • Quality Control(sample-level and gene-level)
    • Bach Effect Assessment
    • Sample Outliers
    • Normalization Effectiveness
  • Modeling data
  • Source of variation (Design formula)
  • Shrinking estimates
  • Statistical testing
  • Multiple Testing Correction
  • Tools: DESeq2, edgeR, limma-voom

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

style I fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Over-Representation Analysis (ORA)
  • Gene Set Enrichment Analysis (GSEA)
  • Tools: clusterProfiler, gProfiler, DAVID

Best Practices for RNA-Seq Analysis

flowchart TD
  A[[Hypothesis & Experiment Design]] --> B[[RNA preparation]]
  B --> C[[Library Generation]]
  C --> D[[Sequencing]]
  D --> E[[QC of Raw Data]]
  E --> F[[Read Alignment]]
  F --> G[[Quantification]]
  G --> H[[Differential Expression Analysis]]
  H --> I[[Functional Profiling]]

%% style H fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold
%% style I fill:#228B22,stroke:#333,stroke-width:2px,color:#ffffff,font-weight:bold

  • Users (i.e. PI/Students)
  • Sequencing Provider(i.e. CCG)
  • Initial Analysis Pipeline (i.e. nf-core/rnaseq)
  • Downstream Analysis (i.e. This Course)

Directory structure and Files

RNA-Seq_ProjectName/
├── data/
│ ├── raw_data/                 
│ ├── reference_data/           
│ ├── meta_data/                
│ └── processed_data/           
│   ├── trimmed_data/           
│   ├── alignments_data/        
│   └── counts_data/            

├── results/
│ ├── qc/                   
│ ├── differential_expression/  
│ ├── functional_profiling/     
│ └── final_figures/            

├── reports/                    
├── scripts/                    
├── R/                          
├── logs/                       
└── README.md

raw_data/ : Sequencing data files (FASTQ format)

reference_data/ : Reference genome reference_data/ : annotation files

alignments_data/ : Aligned reads files (BAM/SAM format)

counts_data/ : Gene expression counts matrix